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Many recent large-scale studies of interaction networks have focused on networks of accumulated 
contacts. In this paper we explore social networks of ongoing relationships with an emphasis on 
dynamical aspects. We find a distribution of response times (times between consecutive contacts of 
different direction between two actors) that has a power-law shape over a large range. We also argue 
that the distribution of relationship duration (the time between the first and last contacts between 
actors) is exponentially decaying. Methods to reanalyze the data to compensate for the finite sampling 
time are proposed. We find that the degree distribution for networks of ongoing contacts fits better 
to a power-law than the degree distribution of the network of accumulated contacts do. We see that 
the clustering and assortative mixing coefficients are of the same order for networks of ongoing and 
accumulated contacts, and that the structural fluctuations of the former are rather large. 



PACS numbers: 89.65.-s,89.75.Hc,89.75.-k 



I. INTRODUCTION 



The recent development in database technology has 
allowed researchers to extract very large data sets of hu- 
man interaction sequences. These large data sets are 
suitable to the methods and modeling techniques of sta- 
tistical physics, and thus, the last years has witnessed the 
appearance of an interdisciplinary field between physics 
and sociology jlH^ITsh. More specifically these studies 
have focused on network structure — in what ways the 
networks of social interaction deviates from completely 
random networks, and how this structure can emerge 
from individual behavior. Most^ of these recent large- 
scale social network studies have focused on networks 
of accumulated relationships. In many cases, the social 
network of interest is rather the network of ongoing so- 
cial relationships: The dynamics of the spreading of dis- 
eases 1 2), opinion formation (3), and fads 1 19) are often 
rather fast compared to the evolution of the network — in 
such cases inactive relationships have no relevance. In 
social search processes (20), distant acquaintances can 
be helpful, but not all acquaintances a person has ever 
had. We also believe the network of ongoing contacts 
lies conceptually closer to the colloquial idea of a net- 
work of friends, than what the network of actors and 
their accumulated contacts do. Furthermore^ traditional 
social network studies (e.g. Refs. I?; ^W; 17)) based on 
interviews or field surveys has mapped out ongoing 
contacts. The complication, and probably the reason 
earlier studies have focused on the network of accumu- 
lated contacts, is that the time of a tie's cessation is less 
clear-cut than its beginning. However, if the sampling 
time of the data set is very large compared to the net- 
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^ To our knowledge, the only type of large (relatively) instantaneous 
network figuring in recent physics literature is networks of corporate 
directors sitting in the same board, Ref. 1 6). 



work dynamics; then one can, at a time t in the interior 
of the sampling time span, approximate the network of 
ongoing relationships by the network of contacts that 
has occurred and will occur again. In the present paper 
we use this method to study the structure and structural 
fluctuations of networks of ongoing relationships. To 
justify that the sampling time is long enough compared 
to the time evolution of the network, we investigate the 
temporal structure of the relationships. The data sets 
we use are obtained from scientific collaborations 111), 
email exchange 1 8) and interaction within an Internet 
community tlQi) . 



II. NOTATIONS AND NETWORK CONSTRUCTION 

All our data sets take the form of lists of triples, or 
contacts, {va, Vb, t) meaning that Va and Vb has interacted 
at time t. For the scientific collaboration networks the 
two first arguments are unordered, for the other two 
networks the interaction is directed. We call the set of 
contacts with the same two first elements (neglecting the 
order) a relationship between va and vb- Our approxima- 
tion of the graph of ongoing contacts at time t is then 
defined as G{t) = {V{t), E{t)], where V{t) is the set of ver- 
tices (or actors) that occur in a contact at a time earlier 
than t, and E{t) is the set of unordered pairs of vertices 
{va, Vb) where there exist contacts between va and Vb at 
times t' and t" such that t' < t < t" . 

For the network of scientific collaborations we use 
similar data as used in Ref. Ill) (but sampled one year 
longer). This data is extracted from the preprint repos- 
itory arxiv.org where scientists themselves can upload 
manuscripts. An edge between va and vb means that va 
has appeared as a coauthor of a preprint together with 
Vb- The time the manuscript is uploaded is the time we 
say the collaboration has occurred. 

The email network is the same data set as presented in 
Ref. 1 8) and consists of all in- and out-going email traf- 
fic to a server handling undergraduate students' email 
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TABLE I Statistics of the networks. Date notations have the format year-month-day hour:minute:second GMT. The number of ties 
does not include self-communication (e.g. self-addressed e-mails) but in "number of contacts" such communication is included. 





e-prints 


e-mail 


pussokram.coiri 


start of sampling 


1995-01-01 06:00:00 


2001-07-29 03:11:33 


2001-02-13 14:39:25 


end of sampling 


2001-01-01 05:31:00 


2001-11-18 02:06:28 


2002-07-10 15:28:00 


sampling duration, t^top 


2192.0 days 


112.0 days 


512.0 days 


number of actors, N 


58342 


64370 


29341 


number of ties, M 


294901 


97425 


115684 


number of contacts 


530481 


447543 


536276 


relationship duration, tdm 


1532(30) days 


187(5) days 


129(10) days 



accotints in Kiel, Germany. 

The Internet community network is constructed from 
the same data set as in Ref. 1 10). Here an edge represents 
any of four different ways of contacts between users of 
the Swedish Internet community pussokram.com. This 
community is intended for romantic communication 
among adolescents and young adults. 

For the email and pussokram.com networks one can 
define a direction for the contacts. In the study of net- 
work structure, however, we will consider the contacts 
as bidirectional. Statistics for the networks are presented 
in TableHI 



III. RELATIONSHIP DYNAMICS AND THE SPEED OF 
NETWORK CHANGE 

Before we investigate the approximate network of on- 
going contacts, as defined above, we discuss the speed 
of interaction and the validity of the approximation. 
First, we focus on the distribution of response times t — 
times between consecutive contacts of different direction 
within a relationship.^ For the undirected e-print data 
we simply define t as the time between consecutive up- 
loads of e-prints within a relationship. We measure the 
T-distribution of the data sets, p', and also a quantity p 
where the effects of the finite size effects are compen- 
sated for. An earlier study (9) has found a power-law 
like T-distribution. As shown in Fig. Qla) this picture 
is confirmed in the large scale. This stretched func- 
tional form makes the finite sampling time a problem 
as it imposes a cut-off on the recorded distribution p'. 
To compensate for this and construct a better approx- 
imation p to the real distribution, we use the formula 
P(B^) = P{A n B^)/P{A\B^) where A is the event that a re- 
sponse interval that starts within the sampling interval 
It = [0, tstop] also ends within It, and is the statement 
that the response time is t. Now P{A n B^) is just the 
frequency distribution of interval length as measured 



^ As mentioned we will focus on undirected networks later, but for 
comparison with other works we use directed contacts in this defi- 
nition. The conclusion from an undirected definition would be the 
same. 
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FIG. 1 Response time statistics, p' is the frequency of response 
times of in the data sets, p is the recalculated quantity com- 
pensated for the finite sampling time, (a) shows p for all data 
sets in log scales. The data is log-binned, p' is plotted for the 
last five bins, elsewhere p' and p overlaps to a great extent. A 
blow-up of p (in linear scale) is shown in (b) (e-mails) and (c) 
(pussokram.com). 



during the sampling. To find P(A|Bt) we note that, if we 
assume that contacts occurs with a constant rate (which 
is reasonable in a long term perspective for a system of 
a relatively constant number of actors), then a response 
interval ends within If with probability 



1 - T/tstop • 



(1) 



However, the sizes of the communities need not to be 
time independent. For response intervals involving an 
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actor that enters the system at time t Eq. flj becomes 
1 - (t + 0/^stop- Now we approximate the time an actor v 
enters the system with the first time U that v is involved 
in a contact, and get the formula: 



and finally a formula for integrating jU: 



P(t) = P{A\Br) = aj^ e(fstop - h - t) 



veV 



tv + T 



stop 



(2) 



where 0( ■ ) denotes the Heaviside fimction, and a is 
a normalizing constant, p is plotted, along with the p' 
values that differs most from p, in Fig.^a) (here a is cho- 
sen to make p coincide with p' for small t values rather 
than to normalize p). We note that p is straighter than 
p' in the log-log scale (for at least the e-mail and pus- 
sokram.com curves), which suggests a power-law like 
behavior over a considerable range. (Of course there is 
eventually a cut-off — from the human life time, if noth- 
ing else.) The e-print curve has a peculiar bend as it 
seems to shift exponent around t - 300 days, an obser- 
vation we hope future studies can explain. There is a 
conspicuous irregularity around t - 1 day for the e-mail 
and pussokram.com curves. This was also observed in 
Ref . ( 9) and explained as an effect of people's everyday 
routines — the Kiel students read and reply their e-mails 
at the same hour as the day before, the pussokram.com 
members log in after school or work, and so on. This 
effect is more visible in a linear scale, see Figs, ^b) and 
(c). For the e-mail curve the peak at seven days is larger 
than the surrounding peaks, indicating that some emails 
are associated with weekly routines among the Kiel stu- 
dents and their contacts. This one-week-peak can not 
be seen in the pussokram.com curve; possibly reflecting 
that business (or university studies) has more weekly 
scheduled routines than leisure do. 

Now we turn to the more central question about the 
speed of relationship cessation. Our central quantity 
is the number of relationships existing at time to that 
still remains at time t (we assume < fo < f < fstop)/ 
j.i{tQ, t). This quantity can crudely be approximated with 
the number of relationships at t that existed at fg that will 
occur again before fstop, fi'(to/ 0- The error in the approx- 
imation will be rather large for t close to istop- But, just as 
above, one can improve the approximation considerably. 
If one assumes that the response time distribution p(t) 
applies to all relationships regardless if the relationship 
is new or old; then, during a time interval At, the change 
of |L( can be written: 



A/./ - Ap' + fiAn 



(3) 



where n{t) is the probability that a relationship, that has 
its last recorded contact at time t, actually continues after 



'stop • 



(4) 



where the sum is over the bins of the p(t) histogram. A 
change of variables gives: 



p(f + At) = 



Afi'it + At) + jA{t) 

1-Afp(fstop-0 



(6) 



We also need the factor a of Eq. ^ which is hard to 
estimate since we don't exactly know p(t)'s long term 
behavior. However, we note that for certain a theii(i) 
curves are rather straight in a lin-log plot, see Fig. 13 (as 
opposed to the e-mail and pussokram.com curves the 
e-print curve decays so slowly that a power-law form of 
jU(i) cannot be ruled out). This means that the charac- 
teristic duration time is well-defined — fitting to an ex- 
ponential A exp(-t/fdur) {-A and t^ur are the two degrees 
of freedom) gives the characteristic durations fdur of re- 
lationships displayed in Table U To be able to approxi- 
mate the network of ongoing contacts with the network 
of contacts that have happened and will happen again 
one would like idur ^ ^stop to hold. We see that for the 
pussokram.com data t^top is about four times as large as 
fdur which enables us to draw some conclusions using 
this approximation. The effective sampling times of the 
e-print and e-mail data are, however, so short that we 
exclude these for the latter section of this paper. 

Now we take a brief look at the time evolution of the 
network sizes — n, the number of active actors at time t, 
and m, the number of edges in our approximate network 
of ongoing relationships. If the number of active users 
increases during the sampling period, the time evolution 
of n and m should be right-skewed, and this is indeed 
true for the e-print data (as seen in Figs. |3Ia) and (b)). 
The e-mail and pussokram.com curves are more sym- 
metric (the pussokram.com curve is indeed slightly left- 
skewed). We note that for pussokram.com, m is much 
less than M. The kinks of the e-mail curves are due to 
group or spam e-mails, the other quantities m, p and so 
on, are not affected by this. 



IV. NETWORK STRUCTURE AND STRUCTURAL 
FLUCTUATIONS 

Now we turn to the structure of the network of ongo- 
ing contacts, and the fluctuations of the structural mea- 
sures. In this Section we only use the pussokram.com 
data (due to, as mentioned above, the large effective sam- 
pling time for this data set). We focus on three quantities 
that recently have received much attention: The first 
structural measure is the distribution of degree (number 
of edges to a vertex). The first quantity is the clustering 
coefficient C(G) where we use the traditional sociological 
definition 



An{t) = Atp{tstop-t) , 



(5) 



C(G) = C3(G)/p3(G) , 



(7) 
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FIG. 2 The speed of network change, fi' is the number of edges at time tg that still are present at time t > to- We choose to = 0.2t, 
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FIG. 3 (a) shows the number of vertices n{t) and (b) shows 
the number of edges m{t) of the networks of contacts that have 
occurred and will occur again. 



where C3(G) is the number of representations of every 3- 
cycle (triangle) of and psiG) is the number of represen- 
tations of 3-paths. The second quantity is the assortative 
mixing coefficient 1121) 



{klk2) - {h){k2) 



(8) 



where averages are taken over E, and fci and k2 are the 
degrees of an edge's first and second arguments as they 
appear in E. 

The cumulative degree distribution of our approxi- 
mate network of ongoing relationships, along with the 
corresponding data for the network of accumulated con- 
tacts is plotted in Fig. |Ha). Just as for the accumulated 
network, our approximate network of ongoing relation- 
ships has a fat tailed degree distribution; but the network 
of ongoing relationships fits better to a single power-law 
with. The stronger downward bend of P(k) for accumu- 
lated social contacts has been observed earlier (10; 11); 
maybe this larger correction to a power-law form is due 
to inactive edges. We note that even if the degree dis- 
tribution fits very well to that of the Barabasi-Albert 
model (4),'* the central ingredient in the Barabasi-Albert 
model (the "preferential attachment") does not apply di- 
rectly to the pussokram.com community. Preferential at- 
tachment means that a vertex acquires new edges with a 
rate proportional to its degree, but in the pussokram.com 
community the degree of a member is invisible to oth- 
ers HO). 

Next we turn to the time evolution of C and r. In 
Figs.|3b) and (c) these are displayed for the whole sam- 
pling time. The earliest and latest times can, of course, be 
affected by the proximity to the borders of the sampling 
time frame — for our discussion we focus on the interval 



^ If {vi,V2,V3) is a triangle, then (v2,V3,Vi) another representation of 

the same triangle. So the number of distinct triangles is C3(G)/6. 
* For a case study of papers citing Ret. i^, see Ret tlSft . 
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FIG. 4 Structure of the network of ongoing contacts, (a) shows the cumulative degree distribution of the network of ongoing 
contacts along with the degree distribution of the network of accumulated contacts. Error-bars are shown if larger than symbol 
size. The errors are calculated assuming that networks at times differing more than tjur are independent, (b) shows the clustering 
coefficient as function of time for networks of ongoing and accumulated contacts, (c) shows assortative mixing coefficient as 
function of time. 



[fdur/ fstop - tduA ~ [129, 383] days. We see that both C and 
r are of the same order of magnitude for the networks 
of ongoing and accumulated contacts. These values of 
C and r are rather neutral in the sense that they can be 
expected from a random network with a skewed degree 
distribution The fluctuations are rather large, es- 
pecially for the clustering coefficient (with a standard 
deviation of around half the average value). An intrigu- 
ing question for future studies is how dynamical systems 
on the networks are affected by strong structural fluctu- 
ations. Slightly outside our interval, at t ^ 395 there is 
an upward jump in both C (from 0.023 to 0.041) and r 
(from -0.057 to -0.043) that is the result of a new contact 
between two of the most central actors. Such a new edge 
introduces a new triangle for every common neighbor 
of the two vertices, and can thus increase C substantially 
as two high-degree actors may have many neighbors in 
common. Such an event will, by definition, also give a 
large positive contribution to the assortative mixing. We 
can expect sudden jumps in many structural quantities 
for networks of ongoing relationships with fat tailed de- 
gree distributions, as the rare event of an edge appearing 
or disappearing between two of the most connected ver- 
tices will affect many structural measures (various kinds 
of centrality measures (18) are probably even more sen- 
sitive to such events). Unfortunately the sampling time 
is too short, despite the fast pussokram.com dynamics, 
to get good statistics for the autocorrelation function of 
C{t) and k{t) (it is consistent with a characteristic time of 
decay similar to tdur)- 



V. SUMMARY AND CONCLUSIONS 

In this paper we investigate networks of ongoing con- 
tacts from three large sets of social interaction data. We 
study the response time distribution and distribution of 
relationship duration. We reanalyze these quantities to 



compensate for the finite sampling time by supposing 
that the response time distribution is the same for all re- 
lationships, and the same throughout the duration of the 
relationship. We find a response time distribution that 
has a power-law like shape in the large scale, but has 
an informative small-scale structure reflecting the daily 
and weekly routines. The distribution of relationship 
duration is consistent with an exponential decay. This 
indicates that there is a well-defined characteristic dura- 
tion time of a relationship, fdur; and that if tdur is much 
less than the sampling time f stop the network of ongoing 
contacts can be reasonably well approximated by the 
network of contacts that have happened and will hap- 
pen again. For one of our data sets — that of the Internet 
commimity pussokram.com — we have 4t(jur ~ ^stop- For 
this data set we compare the approximate network of on- 
going contacts with networks of accumulated contacts — 
the common way of constructing social networks from 
interaction data. We find a degree distribution that fits 
much better to a power-law for the network of ongoing 
contacts than the network of accumulated contacts. The 
clustering coefficient and assortative mixing coefficients 
are of the same order; which, to some extent, justifies 
the use of network of accumulated contacts as a proxy 
for networks of ongoing contacts. The fluctuations in 
these quantities are, however, rather large. A fact that 
may have important consequences for dynamical sys- 
tems. We hope these results will inspire more extensive 
longitudinal studies of interaction networks with fast dy- 
namics, as well-converged data for relationship duration 
distribution and autocorrelation functions of structural 
quantities are within reach. We also point out the inter- 
play between dynamical systems on the networks and 
the structural fluctuations as an interesting area of future 
studies. 
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